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ABSTRACT 

Sparse modeling is a powerful framework for data analysis and 
processing. Traditionally, encoding in this framework is done 
by solving an £± -regularized linear regression problem, usu- 
ally called Lasso. In this work we first combine the sparsity - 
inducing property of the Lasso model, at the individual fea- 
ture level, with the block- sparsity property of the group Lasso 
model, where sparse groups of features are jointly encoded, 
obtaining a sparsity pattern hierarchically structured. This re- 
sults in the hierarchical Lasso, which shows important practi- 
cal modeling advantages. We then extend this approach to the 
collaborative case, where a set of simultaneously coded signals 
share the same sparsity pattern at the higher (group) level but 
not necessarily at the lower one. Signals then share the same 
active groups, or classes, but not necessarily the same active 
set. This is very well suited for applications such as source sep- 
aration. An efficient optimization procedure, which guarantees 
convergence to the global optimum, is developed for these new 
models. The underlying presentation of the new framework and 
optimization approach is complemented with experimental ex- 
amples and preliminary theoretical results. 

1. INTRODUCTION AND MOTIVATION 

In addition to being very attractive at the theoretical level, 
sparse signal modeling has been shown to lead to numerous 
state-of-the-art results in signal processing. The standard model 
assumes that a signal can be efficiently represented by a sparse 
linear combination of atoms from a given or learned dictionary. 
The selected atoms form what is usually referred to as the active 
set, whose cardinality is significantly smaller than the size of 
the dictionary and the dimension of the signal. In recent years, 
it has been shown that adding structural constraints to this ac- 
tive set has value both at the level of representation robustness 
and at the level of signal interpretation (in particular when the 
active set indicates some physical properties of the signal), see 
|1| and references therein. This leads to group or structured 
sparse coding, where instead of considering the atoms as sin- 
gletons, the atoms are grouped, and a few groups are active at 
a time. An alternative way to add structure (and robustness) to 
the problem is to consider the simultaneous encoding of mul- 
tiple signals, requesting that they all share the same active set. 
This is a natural collaborative filtering approach to sparse cod- 
ing, see |2| and references therein. 

In this work we extend these models in a number of di- 
rections. First, we present a hierarchical sparse model, where 
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not only a few (sparse) groups of atoms are active at a time, 
but also each group enjoys internal sparsityQ At the conceptual 
level, this means that the signal is represented by a few groups 
(classes), and inside each group only a few members are ac- 
tive at a time. A simple example of this is a piece of music 
(numerous applications in genomics), where only a few instru- 
ments are active at a time (each instrument is a group), and the 
actual music played by the instrument is efficiently represented 
by a few atoms of the sub-dictionary/group corresponding to it. 
Thereby, this proposed hierarchical sparse coding framework 
permits to efficiently perform source separation, where the in- 
dividual sources (classes/groups) that generated the signal are 
identified at the same time as their efficient representation is 
reconstructed (the sparse code inside the group). An efficient 
optimization procedure is proposed to solve this hierarchical 
sparse coding framework. 

Then, we go a step beyond this. Imagine now that we have 
multiple recordings of the same two instruments (or different 
time windows of the same recording), each time playing differ- 
ent songs. Then, if we apply this new hierarchical sparse coding 
approach collaboratively, we expect that the different record- 
ings will share the same groups (since they are of the same 
instruments), but each will have its unique sparsity pattern in- 
side the group (since each recording is a different melody). We 
propose a collaborative hierarchical sparse coding framework 
addressing exactly thisj^] An efficient optimization procedure 
for this case is derived as well. 

In the remainder of this paper, we introduce these new 
models and their corresponding optimization, present examples 
illustrating them, and provide possible directions of research 
opened by these new frameworks, including some theoretical 
ones. 

2. COLLABORATIVE HIERARCHICAL CODING 

2.1. Background: Lasso and group Lasso 

Assume we have a set of data samples £ M m , j = 1 , . . . , n, 
and a dictionary of p atoms, assembled as a matrix D £ R m xp , 
D = [dich . . . dp]. Each sample x^ can be written as x^ = 
Daj + e, &j £ R p , e £ M m , that is, as a linear combination 
of the atoms in the dictionary D plus some perturbation e. The 

1 While we here consider only 2 levels of sparsity, the proposed 
framework is easily extended to multiple levels. 

2 Note that different recordings can also have different instruments, 
so some of them will share the same groups while not necessarily all of 
them will be exactly the same. 



basic underlying assumption in sparse coding is that, for all or 
most j, the optimal reconstruction a^ has only a few nonzero 
elements. Formally, if we define the cost £o as the pseudo-norm 
counting the number of nonzero elements of a^ , 1 1 a, 1 1 := \{k : 
akj 7^ 0}|, we expect that ||aj|| <C p for all or most j. The 
£o optimization is non-convex and known to be NP-hard, so a 
convex approximation to it is considered instead, which uses 
the £\ norm cost, 

minllall-L s.t. \\xj — Da^ < e. (2.1) 

a 

The above approximation is known as the Lasso [3 1. A popular 
variant is to use the unconstrained version 

1 2 

min — ||xj — Da|| 2 + A ||a.|| 1 , (2.2) 

a 2 

where A is a parameter usually found by cross-validation. 

The H'llj regularizer induces sparsity in the solution a^. 
This is desirable not only from a regularization point of view, 
but also from a model selection point, where one wants to iden- 
tify the relevant features or factors (atoms) that conform each 
sample Xj. In many situations, however, one wants to repre- 
sent the relevant factors not as single atoms but as groups of 
atoms. Given a dictionary of p atoms, we define groups through 
their indexes, g C {1, . . . Given a group g, we define the 
subset of atoms of D belonging to it as Dp, and the corre- 
sponding set of linear reconstruction coefficients as a. g . Define 
Q = {#i, . . . ,g\g\} to be apartitionof {1, . . . ,p}. The group 
Lasso problem was introduced in (4) as 

min i ||xj - DaHj + A^g(a), (2.3) 

a 2 

where i/jg is the group Lasso regularizer defined in terms of Q 
as ipg(a) — ^2 ge g Il a <?ll2- Note tnat can be seen as an 
£\ on Euclidean norms of the vectors formed by coefficients 
belonging to the same group a^ . This is a generalization of the 
i\ regularizer, as the latter arises from the special case Q — 
{1,2,..., p}, and, as such its effect on the groups of a is also 
a natural generalization of the one obtained with the Lasso: it 
"turns on" or "off" atoms in groups. 

2.2. The Hierarchical Lasso 

The group Lasso trades sparsity at the single-coefficient level 
with sparsity at a group level, while, inside each group, the 
solution is dense (actually it reduces to a least squares within 
the group). As we are interested in maintaining the sparsity at 
the coefficient level, we simply re-introduce the £\ regularizer 
together with the group regularizerjeading to the proposed Hi- 
erarchical Lasso (HiLasso) model|j 

min - || Xj - Da||2 + A 2 ^(a) + Ai Ha^ . (2.4) 

a Z 

We refer to this regularizer as the £2 + £i^\ In Section [3] we 
propose an efficient optimization for ( |2.4| ), while in Section [4] 
we experimentally show the virtues of this model. 

3 While preparing the camera ready version of this work we leaned 
of a simultaneously developed paper, 1 5 1, that also proposed this model, 
with a different optimization approach. The collaborative framework 
presented next is not developed in |5 |. See also |6 |. 

4 We can similarly define a hierarchical sparsity model based on £q . 



2.3. Collaborative Hierarchical Lasso 

In numerous applications, one expects that certain collections 
of samples Xj share the same active components from the dic- 
tionary, that is, that the indexes of the nonzero coefficients in 
a.j are the same for all the samples in the collection. Impos- 
ing such dependency in the £\ regularized regression problem 
gives rise to the so called collaborative (also called "multitask" 
or "simultaneous") sparse coding problem (HQ. 

More specifically, if we consider the matrix of coefficients 
.A. — ai , . . . , a n 1 associated to the reconstruction of the sam- 
ples X = [xi, . . . , x n ], the collaborative sparse coding model 
is given by 

1 p ii ii 

min- ||X-DA|| 2 F + A V a fe , (2.5) 

a 2 II 2 

k=l 

where a fe is the k-th row of A, that is, the vector of the n differ- 
ent values that the coefficient associated to the k-th atom takes 
for each sample j. If we now extend this idea to the group 
Lasso, we obtain a collaborative group Lasso formulation, 

min i||X-DA|£ + A^(A), (2.6) 

a Z 

where the regularizer ipg for a matrix is defined as ipg (A) = 
J2 9 eg \\A g \\ F , being A^ the submatix formed by all the rows 
belonging to group chose this notations since this regu- 

larizer is the natural extension of the regularizer in ( |2.3| > for the 
collaborative case. 

To the best of our knowledge, this combination has not yet 
been investigated in the literature. In this paper we are moving 
one step forward and treat this together with the hierarchical ex- 
tension. The combined model that we propose for this problem 
(C-HiLasso) can be written as follows 

1 n 
min - ||X - DA||^ + X 2 ipg(A) + ^ Ai ||a, ||, . (2.7) 

j=i 

The collaborative group Lasso is a particular case of our model 
when Ai is zero. On the other hand, one can obtain indepen- 
dent Lasso for each x» by setting A2 to zero. This new formu- 
lation is particularly well suited when the vectors have missing 
components. In this case combining the information from all 
the samples is very important in order to lead to a correct rep- 
resentation and model (group) selection. This can be done by 
slightly changing the data term in \2.6\ . For each data vector Xj 
one computes the reconstruction error using only the observed 
elements. Note that the missing components do not affect the 
other terms of the equation. 

3. OPTIMIZATION 
3.1. Single-signal problem: HiLasso 

In the last decade, optimization of problems of the form of ( |2.2| ) 
and ([23} have been deeply studied and there exist very efficient 

5 While the introduced collaborative HiLasso model is more general, 
we consider the separable case for the optimization here developed. 



algorithms for solving them. Recently, Wright et. al |8| pro- 
posed a framework, SpaRSA, for solving the general problem 



min/(a) + A^(a) 



(3.8) 



under reasonable assumptions. To guarantee convergence / 
needs to be a smooth and convex function while ip only needs 
to be finite in R n . When the regularizer, ip, is group separable, 
the optimization can be subdivided into smaller problems, one 
per group. The framework becomes powerful when these sub- 
problems can be solved efficiently. This is the case of the Lasso 
and group Lasso settings but is not immediate when the regular- 
izer is the proposed £\ + £2 norm. In this work we combine the 
SpaRSA with the Alternating Direction Method of Multipliers 
1 9 1 (ADMOM), to efficiently solve the HiLasso problem. 

The SpaRSA algorithm generates a sequence of iterates 
{x^t^N that, under certain conditions, converges to the solu- 
tion of (|3.8|>. At each iteration, x t+1 is obtained solving 



V/(x* 



a' 



Iz-x'IL+A^z), (3.9) 



for some sequence of parameters {a*}teN with a* G M + . The 
conditions for which the algorithm converges depend on the 
choice of a*, see | 8 1 for details. 



It is easy to show that \3.9\ is equivalent to 



+ 



A 



^(z) 



(3.10) 



where = x* — V/(x* ) . In this new formulation, it is clear 
that the first term in the cost function can be separated element- 
wise. Thus when the regularization function ip(z) is group sep- 
arable, so is the overall optimization, and one can solve ( |3.10| > 
independently for each group, 



1 



1 1 1 2 A . , N 

U s|l2+^(^) 5 



which in the case of HiLasso, this becomes, 



mm - 



+ ■ 



l|b|| 2 + 



Ai 



(3.11) 



where w 



and u 



a* - ^D T (Da* - x). This is a 
SOCP for which one could use generic solvers. However, this 
subproblem needs to be solved many times within the SpaRSA 
iterations, so it is crucial to solve it efficiently. For this we use 
the ADMOM method |9|. The idea is to solve the artificially 
constrained equivalent problem, 



min - lib - 

b 2 11 



■ W l| 2 + ^ 2 



+ Ai llbll 



s.t. b = P, 



where A; = Xi/a 1 . The algorithm generates a set of iterates 
{b* , p l , p* } t eN + which converges to the minimum of the Aug- 
mented Lagrangian of the problem 



i c (b,ftp) 



||b-w||^ + A 2 ||b|| 2 + A 1 ||b|| 1 



+ p T (b-/3) + ^||b- 



where the elements of p are the so called Lagrangian multipli- 
ers, and c is a fixed constant. At each iteration, the variables b 
and P are updated, one at a time, by minimizing the Augmented 
Lagrangian while letting the remaining fixed: 



b t+1 =argmin^- ||b — + Ai Hb^ + b T p 
b 2 



+ iib. 



II2, 



(3.12) 



P =argmin A 2 
P t+1 = P + c (b t+1 -/3 t+1 



2 -^p+-||b 



t+i 



For convenience in the notation we omitted the super-indexes 
for the iterates at step t, just explicitly indexing them at step 
t + 1. The update for b is separable into scalar subproblems 
on the coordinates of b. The optimality conditions on the sub- 
gradient of each of this scalar problems leads to a simple vari- 
ant of the well known soft- thresholding operator, S(wi, A) = 
sgn(wi) max {0, \ wi\ — A}. For convenience, we use the nota- 
tion «S(w, A) to denote the vector obtained when applying the 
soft- thresholding operator (with parameter A) to each element 
of w. On the other hand, the update for p is not separable into 
scalar subproblems. However its optimality condition is given 
by P' + A 2 <9 \\P'\\ 2 — b ; 3 0, which is exactly the one leading 
to the vector shrinkage operator, S v , described in |4) for the 
group Lasso (actually much simpler, since there is no matrix 
multiplication involved): 



A 2 



S v (h, A 2 ) 



Then both updates can be written in closed form and computed 
very efficiently: 

b —J— <S(w + c/3- p, Ai), P = - <S4p+ cb, A 2 ). 
c+1 c 

The algorithm is very robust and converges in very few iter- 
ations to its optimum, thereby obtaining a very efficient ap- 
proach to solve the subproblem ( |3.11| >. The SpaRSA framework 
then becomes a very interesting approach for the proposed Hi- 
Lasso. The complete algorithm is summarized in Algorithm[T] 
An additional speed up is obtained by bypassing ADMOM when 
a whole group is not active. From the optimality conditions of 
it follows that, if is a solution when Ai = (standard 
group Lasso), it is also a solution in the general case. This can 
be simply checked by evaluating S v ( w , A 2 ) > 0. 

3.2. Optimization of the Collaborative HiLasso 

We now propose an optimization algorithm to efficiently solve 
the collaborative HiLasso. The main idea is to use ADMOM 
to divide the overall problem into two subproblems: one that 
breaks the multi-signal problem into n single-signal £1 regres- 
sions, and another that treats the multi- signal case as a single 
group Lasso-like problem. In this way we take advantage of 
the separability of each term as shown in Figure [T] We define a 
constrained optimization problem, 



1 



iX-DA^+Ai^lla^ + Aa^MB) s.t. A= 



Result: The optimal point x* 
Sett := 0; 

Choose a factor 77 > 1 and constants c > and 

< c^min < a max ; 

Choose an initial x(0) = (xi, X2, . . . , X|g|); 
while stopping criterion is not satisfied do 
Choose a* e [a min , a mSLX \; 
Setu* <- x* - J^V/(x*); 
while stopping criterion is not satisfied do 

% Here we use the group separability of (3.10) and 
% solve (3.11) for each group; 
for i = ltb\G\ do 

if S v (w, A 2 ) > Othen 
Setr := 0; 

Choose an initial p° , (3° , b° ; 

while stopping criterion is not satisfied do 

b r+1 = ^r<S(uf + c /r-p r ,Ai); 

/3 r+1 = |5 v (p r +cb r + 1 ,A 2 ); 



column coupling 



band coupling no coupling 



else 

end 
end 

Set ol 1 <- 



Set r «— r + 1 ; 

end 

Setx* +1 -b^ 1 ; 



-c(b r 



/3 r+1 ); 



Setx* +1 := 



T]OL 



0; 



end 



end 

Set t <- £ + 1 ; 

Algorithm 1: HiLasso optimization algorithm. 



The ADMOM iterations are given by (we omitted the super- 
index for variables at iteration t for notation convenience). 



=argmin- ||X - 
a 2 



DAII 



+ *i^ll a j|li + 



Tr(A T P 



^ + £||B- 



D t+1 . c 

B =argmin- 
b 2 



B — A 

+ A 2 ^(B), 
=P + c(A-B). 



+ Tr(B T P* 



(3.13) 



(3.14) 



Solving for A t+1 : Problem \3A3\ can be separated into n 
single- signal subproblems by updating one column of the ma- 
trix A at a time, 



mm - ||x- 



Da, 



+ Pj a j + o H a i- b ll 2 + A i ll a illi' 



This problem can be solved using the SpaRSA framework. The 
idea is to consider the first three terms of the cost as /(•) in 
Equation ( |3.8| >. The associated computational cost is equivalent 
to the one of the Lasso, since the regularizer is the standard £\ 
norm. 



Solving for B t+1 : The problem given by \3A4\ is group sepa- 
rable, as a direct consequence of the separability of ipg . Thus, 
we need to solve | Q | optimization problems of the form, 



B 



+ Tr(P< +1 Bj) + A 2 ||B 



D A 



Fig. 1. Structure of the problem in terms of coupling. 

where A 9 , B p and P 9 are the \g\ x n sub-matrices of A, B and 
P associated with the group g respectively. We express them 
as column vectors (each with \g\xn components) by concate- 
nating their columns, obtaining , f3 g and p g respectively, and 
rewrite the optimization problem in vectorial form as 



min A2 
b 



T, C 

P 9 b+- 



(3.15) 



g\\F ' 



This problem is identical to ( |3.12| ) and can be reduced to a 
group Lasso problem by simply changing variables and thus, 
it is solved using vectorial thresholding. 



4. EXPERIMENTAL RESULTS 

We start by comparing our model with the standard Lasso and 
group Lasso using synthetic data. We created \Q\ dictionaries, 
D^, with 64 atoms of dimension 64, with i.i.d. Gaussian entries. 
The columns were normalized to have unit £2 norm. Then we 
randomly chose two groups to be active at each time (on all the 
signals). Sets of N = 200 testing signals were generated, one 
per active group, as linear combinations of k <^ 64 elements of 
the dictionaries, xj = D^a*. These signals were also normal- 
ized. The mixtures were created by summing these signals and 
(eventually) adding gaussian noise of standard deviation a. The 
generated testing signals have a hierarchical sparsity structure 
and while they share groups, they do not necessarily share the 
sparsity pattern inside the groups. 

We built a single dictionary by concatenating the sub- 
dictionaries, D = [Di, . . . , D|g|], and use it to solve the 
Lasso, group Lasso, HiLasso and C-HiLasso problems. Table[T] 
summarizes the Mean Square Error (MSE) and Hamming dis- 
tance of the recovered coefficient vectors. We observe that our 
model is able to exploit the hierarchical structure of the data 
as well as the collaborative structure. From a modeling point 
of view, we observe that the group Lasso selects in general the 
correct blocks but it does not give a sparse solution within them. 
On the other hand, Lasso gives a solution that has nonzero el- 
ements belonging to groups that were not active in the original 
signal, leading to a wrong model selection. HiLasso gives a 
sparse solution that picks atoms form the correct groups but 
still presents some minor mistakes. For the collaborative case, 
in all the tested cases, no coefficients were selected outside the 
correct active groups and the recovered coefficients are consis- 
tently the best ones. This robustness comes from the fact that 
the active groups are collaboratively found using the informa- 
tion present in all the signals. We consider the USPS digits 
dataset that has been shown to be well represented in the sparse 
modeling framework [ 10 1. Here the signals are vectors contain- 
ing the unwrapped gray intensities of 16 x 16 images. We chose 
two digits and summed them up to create a mixture image. We 



a 


Lasso 


Glasso 


HiLasso 


C-HiLasso 


0.1 


41.7/ 22.0 


117.3/361.6 


33.0/19.8 


16.3 / 13.3 


0.2 


56.4/21.6 


118.2/378.3 


39.9/22.7 


24.9 /17.1 


0.4 


96.5 / 22.7 


137.8/ 340.3 


65.6/19.5 


59.5 /27.4 


k 


Lasso 


Glasso 


HiLasso 


C-HiLasso 


8 


38.8/22.0 


118.4/318.2 


27.2/19.5 


9.6 / 16.2 


12 


120.0/36.2 


116.6/350.4 


70.4 / 26.5 


41.3/29.1 


16 


164.1/43.9 


109.3/338.6 


110.0/32.2 


55.1/35.0 


101 


Lasso 


Glasso 


HiLasso 


C-HiLasso 


4 


108.0/27.8 


191.6/221.7 


100.9/29.8 


74.2 / 30.2 


8 


120.0/36.2 


116.6/350.4 


70.4/26.5 


41.3/29.1 



Table 1. Active sets mse (we show them multiplied by 10 3 ) and 
Hamming distance (MSE / Hamming) for the tested methods. In the 
first case we vary the noise level while we keep \Q\ =8 and k = 8 
fixed. In the two other tables the signals are noise free and we first set 
\Q\ =8 while changing k, and then set k = 12 while changing the 
number of groups. For each method the regularization parameters were 
the ones for which the best results where obtained. 

created 200 random mixture images and then analyzed them 
with the different methods. In this case there is no ground truth 
active set, and we used as a measure of performance the sepa- 
rating error defined as J2?=i J2jLi || x } _ || 2 » wnere x 5 
is the component corresponding to source i in the signal j, and 
xj is the recovered one. 

Using the usual training-testing split for USPS we first 
learned a dictionary for each digit. We then created a single 
dictionary by concatenating them. In Figure [2] we show the 
separation error obtained in different situations. As in the syn- 
thetic case, only the collaborative method was able to success- 
fully detect the true active sources. We show in Figure [2] some 
examples of the recovered active sets for each method. 

We also used the digits dataset to experiment with missing 
data. We randomly discarded an average of 60% of the pixels 
per mixed image and then applied the C-Hilasso. The algorithm 
is capable of correctly detect which digits are present in the im- 
ages. In Figure [3] we show some examples. Note that this is a 
quite different problem than the one commonly addressed in the 
matrix completion literature. Here we do not aim to recover sig- 
nals that all belong to a unique unknown sub- space, but signals 
that are the combination of two non-unique spaces to be auto- 
matically selected from the available dictionary. Such unknown 
spaces have common models/groups for all the signals in ques- 
tion (the coarse level of the hierarchy), but not necessarily the 
exact same atoms and therefore not necessarily belong to the 
same sub-spaces. Both levels of the hierarchy are automatically 
detected, e.g., that the groups are those corresponding to "3" 
and "5," and the exact atoms (sub-spaces) in each group, these 
last ones possibly different for each signal in the set. While we 
consider that the possible sub-spaces are to be selected from the 
provided dictionary, in Section [5] we discuss learning such dic- 
tionaries as well. In such case, the standard matrix completion 
problem becomes a particular case of the C-HiLasso framework 
(with a single group and all the signals having the same active 
set, sub- space, in the group), naturally opening numerous the- 
oretical questions for this new more general modelF] Finally, 



6 Prof. Carin and collaborators have new results on the case of a 
single group and signals in possible different sub-spaces of the group, 
an intermediate model between standard matrix completion and C- 
HiLasso (personal communication). 



Digits 


Lasso 


Glasso 


HiLasso 


C-HiLasso 


3+5 


74.1 


80.1 


68.6 


63.4 


3+5+n 


87.9 


95.4 


92.9 


77.3 


2+7 


61.1 


60.8 


58.7 


42.6 


2+7+n 


75.4 


65.2 


64.7 


53.7 



Fig. 2. . (Top) The table shows the separating errors (we show them 
multiplied by 10 3 ) for the digits dataset. We show the results for sep- 
arating digits 3 and 5, and 2 and 7, with and without additive noise of 
standard deviation a = 0.1. We used sets of 200 copies. (Bottom) Ac- 
tive sets recovered for the group Lasso, Lasso, HiLasso and C-HiLasso 
for a given example. Each block corresponds to the coefficients as- 
sociated with the digits displayed bellow. The active coefficients are 
displayed in read. Only C-HiLasso manages to perfectly recover the 
correct models (with the lowest separating error), while HiLasso per- 
forms very well also. 

we used C-HiLasso to separate overlapping textures in an im- 
age. We chose 8 textures form the Brodatz dataset and trained 
one dictionary for each one of them (these form the 8 groups of 
the dictionary). Then we created an image as the sum of two 
textues (the testing images were not used in the training stage). 
In Figure [4] we show results. The overall group Hamming dis- 
tance obtained for C-HiLasso is 0.003, showing that the correct 
groups, and only them, were practically selected all the time. 

5. DISCUSSION 

In this paper we have introduced a new framework of collab- 
orative hierarchical sparse coding, where multiple signals col- 
laborate in their encoding, sharing code groups (models) and 
having (possible disjoint) sparse representations inside the cor- 
responding groups. An efficient optimization approach was de- 
veloped, which guarantees convergence to the global minimum, 
and examples illustrating the power of this framework were pre- 
sented. At the practical level, we are currently working on the 
applications of this proposed framework in a number of direc- 
tions, including collaborative instruments separation in music; 
and signal classification, following the demonstrated capability 
to collectively select the correct groups/models. 

At the theoretical level, a whole family of new problems 
is opened by this proposed framework. A critical one is the 
overall capability of selecting the correct groups and thereby 
of performing correct model selection and source separation. 
Let us consider for example the case of only two groups (so 
no sparsity at the group level) and a single signal composed 
by the linear combination of atoms from each group. Then, it 
is easy to show that the cross-mutual coherence between the 
groups plays a critical role. Let us call /i^, i — 1, 2, the internal 
























Fig. 3. . (Top) We show two examples (one per row) of the recov- 
ered digits from a mixture with 60% of missing components. We first 
show the original mixture image, then the image with the missing pix- 
els highlighted in red, and finally the digits recovered. (Bottom) Here 
we show a comparision of the active sets recovered using the Lasso 
(left) and the C-HiLasso (right) methods. The active sets for the set of 
signals (as shown in Figure [2j are placed as columns. The coefficients 
corresponding to digits 3 and 5 fall inside the area delimited by the red 
horizontal lines. While C-HiLasso recovers the correct sources in all 
the cases, the Lasso method makes several mistakes. 

coherence of the atoms of the group i, and \i 1,2 the one between 
the groups (maximal normalized correlation between an atom 
of group 1 with an atom of group 2). Then it is easy to show 
that uniqueness of the separation can be guaranteed if (2k\ — 
+ 2/c/ii,2 < 1 and (2/C2 — l)/i2 + 2/ci/ii,2 < 1, with ki 
the respective sparsity levels inside each group (this is a weaker 
bound that the more stringent one developed by [11]). 

This needs to be extended to actual sparsity at the group 
level and to the collaborative case. Note of course that con- 
sidering a single active group is a particular case of our model 
(see fTOl for works in this case), thereby an overall theoretical 
framework for our proposed collaborative hierarchical frame- 
work will automatically include numerous of the existing re- 
sults in sparse coding. 

Finally, we have also developed a framework for learn- 
ing the dictionary for collaborative hierarchical sparse coding, 
meaning the optimization is simultaneously on the dictionary 
and the code. As it is the case with standard dictionary learn- 
ing, this is expected to lead to significant performance improve- 
ments (again, see ifTOl for the particular case of this with a sin- 
gle group active at a time). 
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Fig. 4. . Results for the texture segmentation. One example of the 
mixture and the C-HiLasso separated textures are shown. This is fol- 
lowed by the active set diagram (as in Figure [3j, Lasso on top (with 
class selection wrongly all over the 8 textures) and C-HiLasso on bot- 
tom, where only the 2 corresponding groups are selected. 
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